09) 




Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(12) 



(43) Date of publication: 

17.03.1999 Bulletin 1999/11 

(21) Application number: 98111042.2 

(22) Date of filing: 16.06.1998 



(11) EP 0 902 395 A1 

EUROPEAN PATENT APPLSCATION 

(51) IntCI. 6 : G06T 7/0.0 



(84) Designated Contracting States: 


(72) 


Inventors: 


AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


« 


Kinoshita, Ketsuke, 


MC NL PT SE 




ATR Human Inf Proc Res Lab 


Designated Extension States: 




Seika-cho, Soraku-gun, Kyoto (JP) 


AL LT LV ftflK RO SI 


• 


Zhang, Zhengyou 






ATR Human Inf Proc Res Lab 


(30) Priority: 22.07.1997 JP 195611/97 




Seika-cho, Soraku-gun, Kyoto (JP) 


(71) Applicant: 


(74) 


Representative: 


ATR HUMAN INFORMATION PROCESSING 




Priifer, Lutz H., Dipl.-Phys. et al 


RESEARCH LABORATORIES 




PRUFER & PARTNER GbR, 


Soraku-gun, Kyoto 619-02 (JP) 




Patentanwalte, 






Hart ha user Sfrasse 25d 






81545 ftflunchen (DE) 



(54) Linear estimation method for three-dimensional position with affff ine camera correction 



(57) A plurality of cameras (1, 2, .... N) acquire 
images of a plurality of reference points located on 
known positions in a three-dimensional space, an 
image processor (10) obtains the coordinates of pro- 
jected points thereof on the respective images, and a 
plurality of affine cameras having linear relation 
between the three-dimensional space and images are 
assumed for calculating how the affine cameras project 
the respective reference points and correcting the coor- 
dinates of the projected points to be consistent with the 
projected points. - 
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Description 

BACKGROUND OF THE INVENTION 
5 Field of the Invention 

[0001 ] The present invention relates to a linear estimation method for a three-dimensional position with aff ine camera 
correction, and more particularly, it relates to a linear estimation method for a three-dimensional position with affine 
camera correction for estimating the three-dimensional position of an object point in a three-dimensional space from 
10 images acquired by a plurality of cameras in case of controlling a robot or the like with image information. 

Description of the Prior Art 

[0002] A method of acquiring an image of a point (object point) existing in a three-dimensional space with a camera 

rs and estimating its three-dimensional position is a central problem of computer vision. Stereoscopy can be referred to 
as the most basic technique for solving this problem. In the stereoscopy, two cameras located on previously known posi- 
tions in previously known orientations acquire images of an object point, for deciding its three-dimensional position from 
the projected images with the principle of triangulation. In such stereoscopy, it is necessary to correctly measure the 
positions, orientations and focal lengths of the cameras. This is called camera calibration, which has generally been 

20 studied in the fields of computer vision and robotics. In this case, the relation by perspective projection is generally 
employed as the method of describing the relation between the three-dimensional space and the images. 
[0003] This perspective projection model can be regarded as an ideal model for general cameras. Despite its correct- 
ness, however, this projective model is nonlinear. Due to such non-linearity, three-dimensional position estimation is 
weak against computation errors or measurement errors of the projected points. 

25 [0004] Study has been made for approximating the perspective projection model with a camera model which has bet- 
ter properties. For example, "Geometric Camera Calibration using Systems of Linear Equations" by Gremban, Thorpe 
and Kanade, International Conference on Robotics and Automation, pp. 562-567 (1988) applies approximation of a 
camera model to camera calibration. Thereafter study of an affine camera model has been developed in "Self-Calibra- 
tion of an Affine Camera from Multiple Views" by Quan, International Journal of Computer Vision, Vol. 19, No. 1. pp. 93- 

30 105 (1996). The affine camera model describes a three-dimensional space and images in linear relation. It is known 
that the affine camera model solves problems resulting from non-linearity and provides sufficiently good approximation 
of a perspective projection model if the thickness of the object is sufficiently smaller than the distance between the cam- 
era and the object. 

[0005] "Euclidean Shape and Motion from Multiple Perspective Views by Affine Iterations" by Christy and Horaud, 
35 IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 1 1, pp. 1098-1 104 (1996) describes an 
applied example of this affine camera model in three-dimensional position estimation. This example is adapted to 
approximately estimate the three-dimensional position of an object point with an affine camera model for optimizing 
nonlinear equations obtained from a perspective camera with the approximate value employed as an initial value 
thereby further correctly estimating the three-dimensional position of the object point. However, this method requires 
40 the operation of optimizing the nonlinear equations, and cannot be regarded as a simple solution. 

[0006] Multiple-camera stereoscopy employing a plurality of (at least two) cameras forms another flow of the stereos- 
copy: In the multiple-lens stereoscopy, it is expected that the information content increases as compared with the case 
of employing only two cameras and the three-dimensional position can be further stably estimated. For example, 
"Shape and Motion from Image Streams under Orthography: a Factorization Method" by Tomasi and Kanade, Interna- 
ls tional Journal of Computer Vision, Vol. 9, No. 2, pp. 1 37-1 54 (1992) describes an example of such multiple-camera ster- 
eoscopy. According to Tomasi et al. P it is possible to estimate a three-dimensional position by a simple method called a 
factorization method, if a plurality of orthographic projection cameras can be assumed. If the cameras are not ortho- 
graphic projection cameras, however, it is necessary to satisfy nonlinear constraint condition called epipolar constraint 
for three-dimensional position estimation. Therefore, the three-dimensional position cannot be readily estimated. 

50 

SUMMARY OF THE INVENTION 

[0007] Accordingly, a principal object of the present invention is to provide a linear estimation method for a three- 
dimensional position with affine camera correction, which can reduce influence by computation errors or noise by sim- 
55 plifying estimation of the three-dimensional position of an object point by employing a plurality of images and correcting 
projected images acquired by cameras modeled by perspective projection to images acquired by a simple linear cam- 
era model. 

[0008] Briefly stated, the present invention is directed to a method of obtaining the three-dimensionat position of an 
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object point with a plurality of cameras, by acquiring images of a plurality of reference points located on known positions 
in a three-dimensional space with the plurality of cameras, obtaining the coordinates of the projected points on the 
respective images, assuming a plurality of affine cameras having linear relation between the three-dimensional space 
and images, calculating how the affine cameras project the respective reference points, and correcting the coordinates 

5 of the projected points to be consistent with the projected points. 

[00X19] According to the present invention, therefore, the virtual affine cameras project all reference points once again 
without knowing correct projective models of the respective cameras for estimating the three-dimensional position, 
whereby it is possible to estimate the three-dimensional position with no regard to the types, positions and orientations 
of the cameras. Further, the calculation itself is simple, the three-dimensional position can be stably estimated in a lin- 

10 ear form from the virtual affine cameras, and influence by computation errors or noise can be reduced. 

[0010] In a more preferable embodiment of the present invention, coefficients of a second order polynominal of the 
coordinates of the projected points are regarded as parameters for the correction, for estimating the three-dimensional 
position of an arbitrary object point in the three-dimensional space in response to decision of the parameters. Further, 
projected points of the object point are corrected in accordance with the second order polynominal to be consistent with 

is the coordinates of the projected points by the virtual affine cameras, for estimating the three-dimensional position of the 
object point by linear calculation. 

[001 1 ] The foregoing and other objects, features, aspects and advantages of the present invention will become more 
apparent from the following detailed description of the present invention when taken in conjunction with the accompa- 
nying drawings. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0012] 

25 Fig. 1 is a schematic block diagram showing an apparatus for carrying out an embodiment of the present invention; 

Fig. 2 is a conceptual diagram showing a state of acquiring images of a point P in a three-dimensional space with 
N cameras; 

Fig. 3 is a conceptual diagram of extended cameras; 

Fig. 4 is adapted to illustrate correction from projection by perspective cameras to that by affine cameras; 
30 Fig. 5 is a flow chart of a calibration stage in the embodiment of the present invention; and 
Fig. 6 is a flow chart of an estimation stage. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

35 [001 3] Fig. 1 is a schematic block diagram showing an apparatus for carrying out an embodiment of the present inven- 
tion. 

[0014] Referring to Fig. 1, a plurality of cameras 1, 2, .... N acquire images of a plurality of reference points located 
on known positions in a three-dimensional space and supply outputs thereof to an image processor 10, which in turn 
forms images acquired by virtual affine cameras and estimates a three-dimensional position by linear operation. 
40 [0015] Fig. 2 is a conceptual diagram showing a state of acquiring images of a point P in a three-dimensional space 
with N cameras. Definition of a camera model is now described with reference to Fig. 2. It is assumed that P stands for 
the point in the three-dimensional space and p stands for the position of its projected image on each image. Expressing 
these as P_and p_ on homogeneous coordinates respectively, the following relation is obtained: 

45 p_ - MP_ 

where M represents a 3 x 4 matrix called a projective matrix, which expresses the projective model, the positions, ori- 
entation and optical properties of the cameras. Such a camera model is called a perspective camera model if the matrix 
M can be decomposed as follows: 

50 
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where f u and f v represent the focal lengths of the cameras, u 0 and v 0 represent the image centers, and R and t represent 
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orientation and position of the camera expressed with rotation matrix and translation vector, respectively. These form 
an ideal model for general cameras. 

[0016] When the first to third columns on the fourth row of a matrix M A are zero, on the other hand, such a camera 
model is called an affine camera model. 



VOOOV 



[0017] Returning from the homogeneous coordinate expression to the generally employed Euclidean coordinate 
expression, the coordinates p A of the projected point p on the images can be related in the affine camera model as fol- 
lows, assuming that P A represents the three-dimensional coordinates of the point P: 



15 



20 



t C A P A+ d 



where C A represents a 2 x 3 matrix. An arbitrary point P G in the three-dimensional space and its projected point p G on 
the image are assumed here. P A and p A are expressed in a new coordinate system with the origins at P G and p G 
respectively. Assuming that P* = P A - P G and p* = p A - p G , the term d vanishes to provide the following compact 
expression: 



p' = C A P' 



(1) 
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It has been recognized that the three-dimensional space P' and the image p' can be related with each other by the 2 x 
3 matrix C A if employing the coordinates with previous subtraction of P G and p G . This matrix C A is called an affine cam- 
era matrix. For example, if there are a number of points, it is convenient to employ center of gravity of these points as 
P G . A projected point of the center of gravity P G in the three-dimensional space is consistent with the center of gravity 
p G of the projected points on the images. The following description is made with reference to a coordinate system from 
which the center of gravity is previously subtracted. 

[0018] Fig. 3 is a conceptual diagram of extended cameras, and Fig. 4 is adapted to illustrate correction from projec- 
tion by perspective cameras to that by affine cameras. 

[0019] The relation between the three-dimensional space and a plurality of images acquired by the plurality of cam- 
eras is shown. It is assumed that, in case of acquiring images of the point P in the three-dimensional space by the plu- 
rality of (N) cameras as described above, pj p N denote projected points of the point P in the respective cameras is 

shown. These projected points, which are obtained by acquiring images of the point P with the N cameras, can also be 
regarded as projection of the point P in the three-dimensional space onto 2N-dimensional Image" as shown in Fig. 3. 
The cameras thus projecting the object in the three-dimensional space onto the 2N-dimensional image as shown in Fig. 
3 are referred to as extended cameras. It must be noted here that the extended cameras are in a redundant observation 
system causing no dimensional degeneration from three dimensions. 

[0020] A coordinate system having the origins at an arbitrary point (generally the center of gravity) and its projected 
images is employed. It is assumed that P A and p } represent the coordinates of the point P and the projected point on 
the position P of an image i, to obtain the following expression: 



45 



50 



Pi 

Pn 



(2) 



55 



[0021 ] This can be regarded as the coordinates of the projected points in the extended cameras. 
[0022] Assuming that the respective cameras are affine cameras and Cj represents the affine camera matrix in a cam- 
era i, the following relation is obtained from the expression (1): 



PAi = C j P A 
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[0023] As to all cameras, the relation can be described as follows: 

P* A -C V P A 

5 where C v is in the following relation: 



(3) 
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p* A represents 2N_ vectors, and C v represents a 2N x 3 matrix. The matrix C v corresponds to an affine camera matrix 
of the extended cameras on the assumption that the respective cameras are affine cameras. 

[0024] If C v is known under the relation of the expression (3), the three-dimensional position P A of the point P can be 
solved from the images p* A The following expression (5) may hold with a pseudo-inverse matrix of C v : 



P = C^p* A 



(5) 



[0025] If the observed values p* A are noiseless, i.e., if p* A are correctly observed, P A obtained by the expression (5) 
is a correct three-dimensional position. Namely, this P A is not an approximate solution, but the correct three-dimen- 
sional position can be reconstructed under the assumption of the affine cameras. If p* A have noise, on the other hand, 
the solution of the expression (5) is an estimation minimizing ||p* A - C V P A || , i.e., minimizing errors on images 
acquired by extended affine cameras. If the extended affine cameras are assumed, the relation between the coordi- 
nates P A of the three-dimensional space and the images p* A is linear. It is obvious that resistance against analysis or 
noise is remarkably superior to that of nonlinear relation, due to the linear relation. 

[0026] It has been recognized that the three-dimensional position of the point in the three-dimensional space can be 
linearly reconstructed if extended affine cameras can be assumed. This is only when affine cameras can be assumed, 
and it is readily inferable that perspective cameras cannot correctly reconstruct the three-dimensional position in the 
method of the expression (5) even with noiseless images. If images equivalent to those acquired by affine cameras can 
be formed from acquired by perspective cameras with some method, however, it must be possible to linearly estimate 
the three-dimensional position with such images. 

[0027] Images acquired by perspective cameras are corrected to those equivalent to images acquired by affine cam- 
eras here. While epipolar geometric constraint may be employed for this correction, a correction method which is as 
simple as possible is employed here. The following description is made with reference to extended cameras. 
[0028] As shown in Fig. 4, it is assumed that p* represents images of the point P in the three-dimensional space 
acquired by the actual N cameras shown in Fig. 1 . Considering virtual extended affine cameras provided by the matrix 
C v , it is assumed that p* A represent images of the point P acquired by these cameras. The images p* are corrected to 
be consistent with p* A . While various methods may be employed for this correction, the k-th element 
(p* A ) k (1 ^ k^ 2N) of p* A is approximated with a second order polynomial of p*, i.e.. modeled in the following 
expression (6): • j 



(P*A>k=P~* T kP 



(6) 



where 



and T K represents a (2N + 1) x (2N + 1) symmetrical matrix. 

[0029] The degree of freedom of T, which is 2N x (2N + 1) x (N + 1) in total, can conceivably sufficiently describe a 
perspective camera model. It is to be noted that T is common for all points. T is a parameter decided by the projection 
method of the actual cameras and the properties and arrangement of the cameras such as the positions and orienta- 
tions thereof. Therefore, this parameter T can be continuously used unless the cameras are moved or the focal lengths 
thereof are varied. 

[0030] A three-dimensional reconstruction method is now described in detail. In order to concretely decide the three- 
dimensional position of the object point, two stages are necessary. The first stage is a calibration stage for deciding the 
parameters C v of the virtual extended affine cameras and the parameter T for correction. The second stage is an esti- 
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mation stage for correcting the images acquired by the actual cameras to be equivalent to those projected by the 
extended affine cameras and linearly estimating the three-dimensional position of the object point. 
[0031 ] Figs. 5 and 6 are flow charts of the calibration and estimation stages respectively. The image processor shown 
in Fig. 1 carries out a program based on these flow charts. 

5 [0032] The calibration stage is now described with reference to Fig. 5. In order to reconstruct three-dimensional posi- 
tion information, the extended affine camera matrix C v and the matrix T for correction must be obtained. This operation 

can be regarded as system calibration. Employed therefor are M reference points P, P N located on previously 

known three-dimensional positions and projected images p.* 1f .... p* N obtained by these N cameras. Either a lattice for 
calibration or an arbitrary sequence of points may be employed. Considering that the inventive method is an approxi- 

w mate three-dimensional reconstruction method, however, the reference points P 1( .... P M are preferably set in an area 
not much different from that of the three-dimensional space to be estimated in practice. 

[0033] The virtual extended affine camera matrix C A is first obtained from the sequence of reference points P 1t .... 
P M . Basically the matrix C v may be any matrix, since images acquired by affine cameras can be converted to those 
acquired by arbitrary affine cameras by affine transformation. In consideration of the fact that the inventive method is 
15 approximation, the parameters of the virtual affine cameras may be selected to be the best approximation of the actual 
cameras. Assuming that all cameras are affine cameras, there must be the following relation: 



20 



C v Pi ■■.P M ] = lP*i -P*m] 

P 1( .... P M and p* 1( .... p* M are coordinates normalized by subtracting the center of gravity. This equation is so guided 
as to minimize the total ||[p* 1 ... p* J - C 0 [P 1 ... P M ]|| of errors of all points in the extended cameras, for obtaining the 
following expression (7): 
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C = \p*, ...P* M ] + [P ;1 ...P M ] {7 ) 

[0034] Then, the matrix T for correction is obtained. Assuming that the extended affine cameras C v acquire images 
of P j( projection is made on p* A] = C V P , on the images, p* is corrected to obtain p* Aj . As hereinabove described, this 
correction is carried out in accordance with the following expression: 

T 

(P* Aj ) k =P~* j T kP~*j 

Since p* and p* Aj (1 ^ j * M) are obtained, a linear equation may be solved in relation to the element of the term (2N 
+ 1)x(N + 1) of eachT K . v 

[0035] On the aforementioned assumption, the virtual affine camera matrix C v and the correction matrix 
T K (1 * k== 2N) 1are decided from the three-dimensional positions P= of the M reference points and the projected 
images p*y thereof by the N cameras. 

[0036] The three-dimensional position estimation method is now described with reference to Fig. 6. Similarly to the 
aforementioned calibration stage, the N cameras acquire images of the object point P to obtain projected images q* 
whose coordinates are normalized in consideration of the center of gravity P G of the reference points and the center of 
gravity p* G in each camera. In order to obtain projected images q* A equivalent to those acquired by the virtual extended 
affine cameras C v , correction is made in accordance with the following expression: 

(q* A ) k =q"~* T T k q~*(1 < k < 2N) 

[0037] With the obtained q* A , the three-dimensional position Q is estimated in accordance with the followinq expres- 
sion (8): 



Q = c;q* A (10) 

[0038] According to the embodiment of the present invention, the virtual affine cameras project all reference points 
once again without knowing correct projective models of the respective cameras for estimating the three-dimensional 
position, whereby it is possible to estimate the three-dimensional position with no regard to the types, positions and ori- 
entations of the cameras. Further, the calculation itself is simple, and the three-dimensional position can be stably esti- 
mated in a linear form from the virtual affine cameras. Thus, influence by computational errors or noise can be reduced. 
[0039] Although the present invention has been described and illustrated in detail, it is clearly understood that the 
same is by way of illustration and example only and is not to be taken by way of limitation, the spirit and scope of the 
present invention being limited only by the terms of the appended claims. 
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Claims 

1. A linear estimation method for obtaining a three-dimensional position of an object point with a plurality of cameras 
(1, 2, .... N), including: 

a first step of acquiring images of a plurality of reference points being located on known positions in said three- 
dimensional space by said plurality of cameras and obtaining coordinates of projected points thereof on 
respective said images; and 

a second step of assuming a plurality of affine cameras having linear relation between said three-dimensional 
space and images, calculating how said reference points are projected by said affine cameras and correcting 
said coordinates of said projected points to be consistent with said projected points. 

2. The linear estimation method for a three-dimensional position in accordance with claim 1, wherein 

said second step regards coefficients of a second order poly nominal of said coordinates of said projected 
points as parameters for correction. 

3. The method of linearly estimating a three-dimensional position in accordance with claim 2, further including a step 
of estimating the three-dimensional position of an arbitrary object point in said three-dimensional space in 
response to decision of said parameters for correction. 

4. The method of linearly estimating a three-dimensional position in accordance with claim 2 or 3, further including a 
step of correcting projected points of said object point with said second order polynominal to be consistent with the 
coordinates of projected points in said virtual affine cameras and estimating the three-dimensional position of said 
object point by linear calculation. 
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